How to Print Floating-Point Numbers Accurately 





Guy L. Steele Jr. 

Jon L White 

m 

Thinking Machines Corporation 

Lucid, Inc. 

. , 

245 First Street 

707 Laurel Street 

•• 1 

Cambridge, Massachusetts 02142 

Menlo Park, California 94025 

ty M 

glsathink.com 

j oniaiucid . com 



Abstract 

We present algorithms for accurately converting 
floating-point numbers to decimal representation. The 
key idea is to carry along with the computation an ex- 
plicit representation of the required rounding accuracy. 

We begin with the ’Simpler problem of converting 
fixed-point fractions. A modification of the well-known 
algorithm for radix-conversion of fixed-point fractions 
by multiplication explicitly determines when to termi- 
nate the conversion process; a variable number of digits 
are produced. The algorithm has these properties: 

• No information is lost; the original fraction can be 
recovered from the output by rounding. 

• No “garbage digits” are produced. 

• The output is correctly rounded. 

• It is never necessary to propagate carries on 
rounding. 

We then derive two algorithms for free-format out- 
put of floating-point numbers. The first simply scales 
the given floating-point number to an appropriate frac- 
tional range and then applies the algorithm for fractions. 
This is quite fast and simple to code but has inaccura- 
cies stemming from round-off errors and oversimplifica- 
tion. The second algorithm guarantees mathematical 
accuracy by using multiple-precision integer arithmetic 
and handling special cases. Both algorithms produce no 
more digits than necessary (intuitively, the “1.3 prints 
as 1.2999999” problem does not occur). 

Finally, we modify the free-format conversion al- 
gorithm for use in fixed-format applications. Infor- 
mation may be lost if the fixed format provides too 
few digit positions, but the output is always correctly 
rounded. On the other hand, no “garbage digits” are 
ever produced, even if the fixed format specifies too 
many digit positions (intuitively, the “4/3 prints as 
1.333333328366279602” problem does not occur). 
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1 Introduction 


Isn’t it a pain when you ask a computer to divide 1.0 by 
10.0 and it prints 0.0999999? Most often the arithmetic 
is not to blame, but the printing algorithm. 

Most people use decimal numerals. Most contempo- 
rary computers, however, use a non-decimal internal HP 
representation, usually based on powers of two. This 
raises the problem of conversion between the internal | 
representation and the familiar decimal form. f:.® 

The external decimal representation is typically used ‘ 
in two different ways. One is for communication with : j 
people. This divides into two cases. For fixed-format 
output, a fixed number of digits is chosen before the 
conversion process takes place. In this situation the pri- 
mary goal is controlling the precise number of charac- 
ters produced so that tabular formats will be correctly 
aligned. Contrariwise, in free-format output the go 
is to print no more digits than are necessary to com- 
municate the value. Examples of such applications are gg| 
Fortran list-directed output and such interactive lan- 
guage systems as Lisp and APL. Here the number of| 
digits to be produced may be dependent on the value j 
to be printed, and so the conversion process itself must 
determine how many digits to output. 

The second way in which the external decimal rep- ; 
resentation is used is for inter-machine communication. ; 
The decimal representation is a convenient and conven- 
tional machine-independent format for transferring nu- 
merical information from one computer to another. Us- 
ing decimal representation avoids incompatibilities of 
word length, field format, radix (binary versus hexadec- 
imal, for example), sign representation, and so on. Here 
the main objective is to transport the numerical value 
as accurately as possible; that the representation is also 
easy for people to read is a bonus. (The IEEE 754 , 
floating-point standard [IEEE85] has ameliorated this 
problem by providing standard binary formats, but the ; 
VAX, IBM 370, and Cray-1 are still with us and will be 
for yet a while longer.) 1 . 
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We deal here with the output problem: methods of 
converting from the internal representation to the exter- 
nal representation. The corresponding input problem is 
also of interest and is addressed by another paper in this 
conference [£linger90]. The two problems are not quite 
identical because we make the asymmetrical assumption 
that the internal representation has a fixed number of 
digits but the external representation may have a vary- 
ing number of digits. (Compare this with Matula’s ex- 
cellent work, in which both representations are assumed 
to be of fixed precision [Matula68] [Matula70].) 

In the following discussion we assume the existence 
of a program that solves the input problem: it reads 
a number in external representation and produces the 
input representation whose exact value is closest, of all 
possible internal representation values, to the exact nu- 
merical value of the external number. In other words, 
this ideal input routine is assumed to round correctly in 
all cases. 

We present algorithms for fixed-point-fraction output 
and floating-point output that decide dynamically what 
is an appropriate number of output digits to produce 
for the external representation. We recommend the last 
algorithm, Dragon4, for general use in compilers and 
run-time systems; it is completely accurate and accom- 
modates a wide variety of output formats. 

We express the algorithms in a general form for con- 
verting from any radix b (the internal radix) to any 
other radix B (the external radix); 6 and B may be any 
two integers greater than 1, and either may be larger 
than the other. The reader may find it helpful to think 
of B as being 10, and b as being either 2 or 16. Infor- 
mally we use the terms “decimal” and “external radix” 
interchangeably, and similarly “binary” and “internal 
radix”. We also speak interchangeably of digits being 
“output” or “printed”. 


Properties of Radix Conversion 


We assume that a number to be converted from one 
radix to another is perfectly accurate and that the goal 
is to express that exact value. Thus we ignore issues of 
errors (such as floating-point round-off or truncation) in 
the calculation of that value. 

The decimal representation used to communicate a 
numerical value should be accurate. This is especially 
important when transferring values between comput- 
ers. In particular, if the source and destination com- 
puters use identical floating-point representations, we 
would like the value to be communicated exactly; the 
destination computer should reconstruct the exact bit 
pattern that the source computer had. In this case we 
require that conversion from internal form to external 
form and back be an identity function; we call this the 


internal identity requirement. We would also like con- 
version from external form to internal form and back to 
be an identity; we call this the external identity require- 
ment. However, there are some difficulties with these 
requirements. 

The first difficulty is that an exact representation is 
not always possible. A finite integer in any radix can 
be converted into a finite integer in any other radix. 
This is not true, however, of finite-length fractions ; a 
fraction that has a finite representation in one radix may 
have an indefinitely repeating representation in another 
radix. This occurs only if there is some prime number 
p that divides the one radix but not the other; then 
1/p (among other numbers) has a finite representation 
in the first radix but not in the second. (It should be 
noted, by the way, that this relationship between radices 
is different from the relationship of incommensurability 
as Matula defines it [Matula70], For example, radices 
10 and 20 are incommensurable, but there is no prime 
that divides one and not the other.) 

It does happen that any binary, octal, or hexadeci- 
mal number has a finite decimal representation (because 
there is no prime that divides 2 fc but not 10). Therefore 
when 5 = 10 and 6 is a power of two we can guarantee 
the internal identity requirement by printing an exact 
decimal representation of a given binary number and 
using an ideal rounding input routine. 

This solution is gross overkill, however. The exact 
decimal representation of an n-bit binary number may 
require n decimal digits. However, as a rule fewer than 
n digits are needed by the rounding input routine to 
reconstruct the exact binary value. For example, to 
print a binary floating-point number with a 27-bit frac- 
tion requires up to 30 decimal digits. (A 27-bit frac- 
tion can represent a 30-bit binary fraction whose dec- 
imal representation has a non-zero leading digit; this 
is discussed further below.) However, 10 decimal dig- 
its always suffice to communicate the value accurately 
enough for the input routine to reconstruct the 27- 
bit binary representation [Matula68], and many par- 
ticular values can be expressed accurately with fewer 
than 10 digits. For example, the 27-bit binary floating- 
point number .110011001100110011001100110 2 x 2" 3 , 
which is equal to the 29-bit binary fixed-point frac- 
tion .00011001100110011001100110011 2 , is also equal to 
the decimal fraction .09999999962747097015380859375, 
which has 29 digits. However, the decimal fraction 0.1 
is closer in value to the binary number than to any other 
27-bit floating-point binary number, and so suffices to 
satisfy the internal identity requirement. Moreover, 0.1 
is more concise and more readable. 

We would like to produce enough digits to preserve 
the information content of a binary number, but no 
more. The difficulty is that the number of digits needed 
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depends not only on the precision of the binary number 
but on its value. 

Consider for a moment the superficially easier prob- 
lem of converting a binary floating-point number to 
octal. Suppose for concreteness that our binary num- 
bers have a six-bit significand. It would seem that two 
octal digits should be enough to express the value of 
the binary number, because one octal digit expresses 
three bits of information. However, the precise oc- 
tal representation of the binary floating-point number 
.IOIIOI 2 x 2 -1 is .264a x 8°. The exponent of the bi- 
nary floating-point number specifies a shifting of the 
significand so that the binary point is “in the middle” 
of an octal digit. The first octal digit thus represents 
only two bits of the original binary number, and the 
third only one bit. We find that three octal digits are 
needed because of the difference in “graininess” of the 
exponent (shifting) specifications of the two radices. 

The number N of radix-5 digits that may be required 
in general to represent an'n-digit radix-6 floating-point 
number is derived in [Matula68], The condition is: 


This may be reformulated as: 

„ n 

N> i H + 1 

log 6 5 

and if N is to be minimized we therefore have: 

N = 2+ [n/(log fc 5)J 

No more than this number of digits need ever be output, 
but there may be some numbers that require exactly 
that many. 

As an example, in converting a binary floating-point 
number to decimal on the PDP-10 [DEC73] we have 
& = 2, 5 = 10, and n = 27. One might naively sup- 
pose that 27 bits equals 9 octal digits, and if nine octal 
digits suffice surely 9 decimal digits should suffice also. 
However, using Matula’s formula above, we find that to 
print such a number may require 2 + [27/(log 3 10)J = 
2+ [27/(3.3219...)] = 2+ [8.1278... J = 10 decimal 
digits. There are indeed some numbers that require 10 
digits, such as the one which is greater than the rounded 
27-bit floating-point approximation to 0.1 by two units 
in the last place. On the PDP-10 this is the number 
represented in octal as 1756314631508, which represents 
the value .63146315a x 8 -1 . It is intuitively easy to see 
why ten digits might be needed: splitting the number 
into an integer part (which is zero) and a fractional part 
must shift three zero bits into the significand from the 
left, producing a 30-bit fraction, and 9 decimal digits is 
not enough to represent all 30-bit fractions. 

The condition specified above assumes that the con- 
versions in both directions round correctly. There are 




other possibilities, such as using truncation (always * 
rounding towards zero) instead of true rounding. If th e ' ? 
input conversion rounds but the output conversion trun- li 
cates, the internal identity requirement can still be met 
but more external digits may be required; the condition ^ 
is B N ~ l > 2 x 6" — 1. If conversion truncates in both §£ 
directions, however, it is possible that the internal iden- I 
tity requirement cannot be met, and that repeated con- l| 
versions may cause a value to “drift” very far from its jH 
original value. This implies that if a data base is main- ^ 
tained in decimal, and is updated by converting it to bi- is 
nary, processing it, and then converting back to decimal, ? 
values not updated may nevertheless be changed, after M 
many iterations, by a substantial amount [Matula70]. J® 
That is why we assume and require proper rounding. ' 
Rounding can guarantee the integrity of the data (in. S> 
the form of the internal identity requirement) and also ^ 
will minimize the number of output digits. Truncation 
cannot and will not. 

The external identity requirement obviously cannot 
be strictly satisfied, because by assumption there are a 
finite number of internally representable values and an 
infinite number of external values. For every internal 
value there are many corresponding external representa- 
tions. From among the several external representations 
that correspond to a given internal representation, we 
prefer to get one that is closest in value but also as short 
as possible. 

Satisfaction of the internal identity requirement im 
plies that an external consistency requirement be met 
if an internal value is printed out, then reading that ex- 
act same external value and then printing it once more 
will produce the same external representation. This 
is important to the user of an interactive system. If 
one types PRINT 3.1415926535 and the system replies 
3.1415926, the user might think “Well and good; the 
computer has truncated it, and that’s all I need type 
from now on.” The user would be rightly upset to 
then type in PRINT 3 . 1415926 and receive the response 
3 . 1415925! Many language implementations indeed ex- 
hibit such undesirable “drifting” behavior. 

Let us formulate our desires. First, in the interest 
of brevity and readability, conversion to external form 
should produce as few digits as possible. Now suppose, 
for some internal number, that there are several exter 
nal representations that can be used: all have the same 
number of digits, all can be used to reconstruct the in- 
ternal value precisely, and no shorter external number 
suffices for the reconstruction. In this case we prefer the 
one that is closest to the value of the binary number 
(As we shall see, all the external numbers must be alike 
except in the last digit; satisfying this criterion there- 
fore involves only the correct choice for the last digit to 
be output.) <t}: 
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On the other hand, suppose that the number of dig- 
its to be output has been predetermined (fixed-format 
output); then we desire that the printed value be as 
close as possible to the binary value. In short, the ex- 
ternal form should always be correctly rounded. (The 
Fortran 77 standard requires that floating-point output 
be correctly rounded [ANSI76, 13.9.5.2]; however, many 
Fortran implementations do round not correctly, espe- 
cially for numbers of very large or very small magni- 
tude.) 

We first present and prove an algorithm for printing 
fixed-point fractions that has four useful properties: 

(a) Information preservation. No information is lost 
in the conversion. If we know the length n of the 
original radix-6 fraction, we can recover the original 
fraction from the radix- f? output by converting back 
to radix-6 and rounding to n digits. 

(b) Minimum-length output. No more digits than nec- 
essary are output td achieve (a). (This implies that 
the last digit printed is non-zero.) 

(c) Correct rounding. The output is correctly rounded; 
that is, the output generated is the closest approx- 
imation to the original fraction among all radix-!? 
fractions of the same length, or is one of two closest 
approximations, and in the latter case it is easy to 
choose either of the two by any arbitrary criterion. 

(d) Left-to-right generation. It is never necessary to 
propagate carries. Once a digit has been generated, 
it is the correct one. 

These properties are characterized still more rigorously 
in the description of the algorithm itself. 

From this algorithm we then derive a similar method 
for printing integers. The two methods are then com- 
bined to produce two algorithms for printing floating- 
point numbers, one for free-format applications and one 
for fixed-format applications. 


3 Fixed-Point Fraction Output 



Following [Knuth68], we use the notation [zj to mean 
the greatest integer not greater than z. Thus |_3.5j = 3 
and (_ — 3.5J = —4. If * is an integer, then [x\ = x. 
Similarly, we use the notation |"z] to mean the smallest 
integer not less than z; [z] = — zj. Also following 
[Knuth68], we use the notation {z} to mean z mod 1, 
or z — |_zj , the fractional part of z. We always indicate 
numerical multiplication of a and 6 explicitly as a x 6, 
because we use simple juxtaposition to indicate digits 
in a place-value notation. 

The idea behind the algorithm is very simple. The 
potential error in a fraction of limited precision may be 
described as being equal to one-half the minimal non- 
zero value of the least significant digit of the fraction. 
The algorithm merely initializes a variable M to this 



value to represent the precision of the fraction. When- 
ever the fraction is multiplied by B, the error M is 
also multiplied by B. Therefore M always has a value 
against which the residual fraction may be meaningfully 
compared to decide whether the precision has been ex- 
hausted. 

Algorithm (FP) 3 : 

Finite-Precision Fixed-Point Fraction Printout 1 

Given: 

• An n-digit radix-6 fraction /, 0 < / < 1: 

/ — * f— if —if —3 - ■ • f-n — ^2 f-i x 6 

«=i 

• A radix B (an integer greater than 1). 

Output: The digits F { of an W-digit ( N > 1) radix-5 
fraction F: 

N 

F = .F_ l F_iF_ 3 ...F_ N B~ i 

»= 1 

such that: 

(a) \F-f\<^- 

(b) N is the smallest integer > 1 such that (a) can be 
true. 

B-n 

(c) \F-f\<Z— 

(d) Each digit is output before the next is generated; 
it is never necessary to “back up” for corrections 

Procedure: see Table 1. I 

The algorithmic procedure is expressed in pseudo- 
ALGOL. We have used the loop- while-rep eat (“n + L 
loop”) construct credited to Dahl [Knuth74]. The loop 
is exited from the while-point if the while-condition 
is ever false; thus one may read “while B: n as 
“if not B then exitloop fi”. The cases statement 
is to be taken as a guarded if construction [Dijkstra76]; 
the branch labeled by the true condition is executed, 
and if both conditions are true (here, if R = |) either 
branch may be chosen. The ambiguity in the cases 
statement here reflects the point of ambiguity when 

1 It is difficult to give short, meaningful names to each of a 
group of algorithms that are similar. Here we give each algorithm 
a long name that has enough adjectives and other qualifiers to 
distinguish it from the others, but these long names cue unwieldy. 
For convenience, and as a bit of a joke, two kinds of abbreviated 
names are used here. Algorithms that are intermediate versions 
along a path of development have names such as "(FP) 3 " that are 
effectively contracted acronyms for the long names. Algorithms 
that are in "final form” have names of the form "Dragonle" for 
integers fc; these algorithms have long names whose acronyms 
form sequences of letters “F" and "P” that specify the shape of 
so-called "dragon curves" [Gardner77], 
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Table 1: Procedure for Finite-Precision Fixed-Point 
Fraction Printout ((FP) 3 ) 

begin 
k <-0; 


R fi 



M*~ b- 

- n /2; 


loop 



k <- 

k + 1; 


U — 

[i2 x 


R — 

{i2 X 

BY, 

M <— M x 

B-, 


while R> M and R < 1 — M 
F- k «- U-, 
repeat; 


U V' 


R < F. k 

R>\: F_ k 
endcases; 

N Jb; 

end 


two radix-5 representations of length n are equidistant 
from / . A given implementation may use any decision 
method when R = L, such as “always U” , which effec- 
tively means to round down; “always If -4-1”, meaning to 
round up; or “if U = 0 (mod 2) then U else U+l”, 
meaning to round so that the last digit is even. Note, 
however, that using “always U” does not always round 
down if rounding up will allow one fewer digit to be 
printed. 

This method is a generalization of the one presented 
by Taranto [Taranto59] and mentioned in exercise 4.4.3 
of [Knuth69]. 2 

4 Proof of Algorithm (FP) 3 

(The reader who is more interested in practical appli- 
cations than in details of proof may wish to skip this 
section.) 

In order to demonstrate that Algorithm (FP) 3 satis- 
fies the four claimed properties (a)-(d), it is useful first 
to prove, by induction on k, the following invariants true 
at the top of the loop, body: 

b~ n x B k 


2 By the way, the paper that was forward- referenced in the 
answer to exercise 4.4.3 in [Knuth81] was an early draft of this 
paper that contained only Algorithm (FP) 3 , its proof, and a not- 
yet-correct generalization to floating-point numbers. 


K 

R k x B~ k + ^ x B~' = / 

i=i ' .#'4 

By M k and R k we mean the values of M and R at the 
top of the loop as a function of k (the values of M and 
R change if and only if k is incremented). Invariant (t) 
is easily verified; we shall take it for granted. Invariant 1 
(»») is a little more complicated. 

Basis. After the first two assignments in the proce- 
dure, k = 0 and Rq = f . The summation in ( ii ) has no 
summands and is therefore zero, yielding 

.‘.V 

Ro x B 0 + Q — f j 

which is true, because initially R — f . [ 

Induction. Suppose (ii) is true for k. We note that 
the only path backward in the loop sets F_, k+1 <, 

[12* x B \ . It follows that: 

k - ( 

f=R k xB~ k + Y J F- i xB~ i 

* = 1 

k 

= (Bx R k ) x 5"( l+1 ) + F_i x B~ { 

*=i 

k \ ,, 

= {{R k x B} + lR k x B\)x 5-(* +1 > + 

1=1 

= {R k +i + F_ (k+1) ) x R-i x B~\ 

i=l 

*+l 1*. 

= R k+l x + £ F_i x B-< 1 J 






p| 

s? : 




W?: 


if, 

m 


§1 


|L 

m 


■ ‘i* 9 

which establishes the desired invariant. 

The procedure definitely terminates, because M ini 
tially has a strictly positive value and is multiplied by 
B (which is greater than 1) each time through the loop 
Eventually we have M > L, at which point R > M 
and R < 1 — M cannot both be true and the loop must 
terminate. (Of course, the loop may terminate before 
M > i, depending on 12.) „’j 

When the procedure terminates, there may be one 
two cases, depending on Rpi. Because of the rounding 
step, we have one of the following: 


ill? 


m. 


f = F + R n x B~ n 
f = F - (1 - R n ) x B~ n 
Let us define R m as follows: 

R‘ = R n 
12* = 1 - R n 


if Rn < 2 J 1 
if I2jv > \ 




if Rn < i i 
it R N > i | 


Because 0 < i2„ < 1, we know that 0 < 12* < L; we can 
therefore -write: .ii 
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B- 


by 

op. 
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ust 
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ing 


/ - F = R" x B~ 
F-f = R*xB- 
From this we have: 


if Rn < \ 
if Rn>\ 


}F-f\ = R m x B~ n < ~~ 

proving property (c) (correct rounding). 

To prove property (a) ( information preservation), we 
consider the two cases Mn > 5 and M n < i. If we 
have Mn > then 


\F~f\< 


< M n x B~ n = 


B n x b~ n 

because Mn = - • If we have Mn < 5, then 

it J 

with the fact that Rn < Mn or Rn > 1 — Mn we have 
R ' < Mn, and so 

\F - f\ = R* x B~**< x B~ n = — 

2 

Property (a) therefore holds in sill cases. 

To prove property (b) ( minimum-length, output), let 
us suppose that, to the contrary, there is some P-digit 
radix-JB fraction G such that: 

P <N and \G-f\< h -^- 

In fact, we assume P — N — 1 without loss of generality 
by allowing G to have trailing zero digits to fill out to 
P places. Now from invariant (ti) we know that: 

p 

R P x B~ p + F -i x = f 

«=i 

It follows easily, because B and the P_; are integers and 
0 < R P < 1, that the closest P-digit representations to 
/ are 

G\ = • P_iP_ 3 P _3 ■ • • F-(p-i)F-p 
and 

Gj = . P— 1 P_ 2 P—3 . . . P_(p_i)(P_p + 1). 

But because the iteration in the algorithm did not ter- 
minate we know that: 


Rp > Mp — 


b~ n x B p 


Rp < (1 — Mp). 

It follows that whether G is chosen to be G\ or G2 (and 
any other choice is even worse) that: 

\G-f\>MpxB~ p = ~ 


producing a contradiction and proving property (b). 

Property (d) ( left-to-right generation) follows imme- 
diately, because the only digit that is ever corrected 
(by adding one) is the last one output. If that correc- 
tion were to produce a carry (i.e., U = B — 1 , and so 
U + 1 = B) then it would mean that the TV -digit radix- 
B output would end in a zero digit, implying that a 
shorter radix- P fraction could have satisfied property 
(a); but this is not possible by property (b). 

The algorithms that follow are all variants of the one 
just proven. No other proofs are presented below; the 
necessary ideas for proving the remaining algorithms are 
simple modifications of those in the proof just given. 
However, the code given for later algorithms contains 
assertions of important invariants; from these invariants 
complete proofs may be reconstructed. 

5 Some Observations 

The same routine can be used for converting fractions of 
various lengths (handling, for example, both single and 
double precision) provided that the radix-6 arithmetic 
is sufficiently accurate for the longest. The only part of 
the algorithm dependent on n is the initialization of M . 

All of the arithmetic is easy to perform in radix 6; 
the only operations used are multiplication, integer and 
fractional part, subtraction from 1, and comparison. 
There is a difficulty if the radix 6 is odd, because the 
initialization of M requires division by 2 ; if the prob- 
lem should arise, one can either do the arithmetic in 
an even radix b’ that is a multiple of b, or initialize M 
to 6 -n (instead of b~ n / 2) and change the comparison 
R> M /\ R < \ — M to 2 x R> M /\2 x R <2 — M. 

The accuracy required for the arithmetic is n -(- 1 + 
[log;,(B — 1)] radix-6 digits: n is the length of the frac- 
tion (and of all remainders), 1 more is needed for the 
factor of | in M, and [log 1) ( J B — 1 )] more are needed 
for the result of the multiplication to produce a radix- B 
digit in radix 6. 

Algorithm (FP) 3 is (almost) suitable for output of 
floating-point numbers; it can be used if the floating- 
point number is first scaled properly. 

Algorithm Dragon 2 : 

Floating-Point Printout 

Given: 

• A radix-6 floating-point number v — f x b^ e ~ v \ 
where e, /, and p are integers such that p > 0 and 
0 < / < b*. 

• A radix B (an integer greater than 1 ). 

Output: A radix- .B floating-point approximation to Q, 
using exponential notation if Q is very large or very 
small. 
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Procedure: If / = 0 then simply print “0.0”. Otherwise 
proceed as follows. First compute v' = v ® B~ x , where 
“®” denotes floating-point multiplication and where * 
is chosen so as to make v' < b p . (If exponential for- 
mat is to be'used, one normally chooses * so that the 
result either is between 1/2? and 1 or is between 1 and 
B.) Next, print the integer part of v 1 by any suitable 
method, such as the usual division-remainder technique; 
after that print a decimal point. Then take the frac- 
tional part / of v\ let n = p — [logs v '\ — 1> and apply 
Algorithm (FP) 3 to / and n. Finally, if x was not zero, 
the letter “E” and a representation of the scale factor x 
are printed. I 

We note that if a floating-point number is the same 
size (in bits, or radix-6 digits, or whatever) as an inte- 
ger, then arithmetic on integers of that size is often suffi- 
ciently accurate, because the bits used for the floating- 
point exponent are usually more than enough for the 
extra digits of precision required. 

This method for floating-point output is not entirely 
accurate, but is very simple to code, typically requir- 
ing only single-precision integer arithmetic. For appli- 
cations where producing pleasant output is important, 
program and data space must be minimized, and the in- 
ternal identity requirement may be relaxed, this method 
is excellent. An example of such an application might 
be an implementation of BASIC for a microcomputer, 
where pleasant output and memory conservation are 
more important than absolutely guaranteed accuracy. 
[The last sentence was written circa 1981. Nowadays all 
but the tiniest computers have the memory space for the 
full algorithm.] This algorithm was used for many years 
in the MacLisp interpreter and found to be satisfactory 
for most purposes. 

There are two problems that prevent Algorithm 
Dragon2 from being completely accurate for floating- 
point output. 

The first problem is that the spacing between adja- 
cent floating-point numbers of a given precision is not 
uniform. Let v be a floating-point number, and let v~ 
and v + be respectively the next-lower and next-higher 
floating-point numbers of the same precision. For most 
values of v, we have v + — v = v — v~. However, if v is an 
integral power of 6, then instead v + — v = b x (v — v ~ ); 
that is, the gap below v is smaller than the gap above 
v by a factor of 6. 

The second problem is that the scaling may cause 
loss of information. There may be two numbers, very 
close together in value, which when scaled by the same 
power of B result in the same scaled number, because of 
rounding. The problem is inherent in the floating-point 
representation. See [Matula68] for further discussion of 
this phenomenon. 


Table 2: Procedure for Indefinite Precision Integer 
Printout ((IP) S ) _ ^ jjjl 


he f “ q ' *|f| 

Ri-d; 

Jlf *— 6 n ; 

5 «— 1; 

loop " — 

5 «— S x B \ 

k *— k + 1; '"w 

while (2 x R) + M > 2 x S : 
repeat; 

JT-Jb-l; 

assert S = 2?( ff+1 ) O i|| 

loop i 

k «— k — 1; 

5 — S/B\ 

U-[R/S\; 

R *— R mod S; 

while 2 x R > M and 2xR<(2xS) — M : 'f 

°k - U- 

repeat; .. ” 


■■Twaff 


■tS 


Mil 


m 




2 x R < S : Dk *— U\ 

2 x R> S : D k *-U + 1; 




endcases; 

JV -Jfc; 
end 


■ :iia 


6 Conversion of Integers 

To avoid the round-off errors introduced by scaling' 
floating-point numbers, we develop a completely 
curate floating-point output routine that performs no 
floating-point computations. As a first step, we exhibit 
an algorithm for printing integers that has a termi: 
tion criterion similar to that of Algorithm (FP) 3 . 


m 


Algorithm (IP) 2 : '•/! 

Indefinite Precision Integer Printout 

Given: 

• An h-digit radix-6 integer d, accurate to position 
n (» > 0): , ;M 




d = d K d h _i ...d n+l d n (n zeros). = X V 


mk :i 


A radix B. 
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Output: Integers H and N (H > N > 1) as an H- 
digit radix-5 integer D: 


D = DhDh-i • ■ ■ Dn +1 Dn(N zeros). = ^ D{ x 5‘ 


such that: 

(a) \D-d\<- 

(b) N is the largest integer, and H the smallest, such 
that (a) can be true. 

B n 

(c) \D — d\ < — 

(d) Each digit is output before the next is generated. 
Procedure: see Table 2. I 

In Algorithm (IP) 2 all quantities are integers. We 
avoid the use of fractional quantities by introducing a 
scaling factor S and logically replacing the quantities R 
and M by R/S and M/S. Whereas in Algorithm (FP) 3 
the variables R and M were repeatedly multiplied by 5, 
in Algorithm (IP) 2 the variable S is repeatedly divided 
by 5. 


Free-Format Floating-Point Output 


From these two algorithms, (FP) 3 and (IP) 2 , it is now 
easy to synthesize an algorithm for “perfect” conversion 
of a (positive) fixed-precision floating-point number to a 
free-format floating-point number in another radix. 
shall assume that a floating-point number / is repre- 
sented as a tuple of three integers: the exponent e, the 
“fraction” or “significand” m, and the precision p. To- 
gether these integers represent the mathematical value 
f = mx b( c ~ p \ We require m < b p \ a normalized rep- 
resentation additionally requires m > b( p ~ 1 ), but we do 
not depend on this. 

We define the function shifty of two integer arguments 
* and n (x> 0): 


shift b(x,n) = x 6"J 


This function is intended to be trivial to implement in 
radix-6 arithmetic: it is the result of shifting x by n 
radix-6 digit positions, to the left for positive n (intro- 
ducing low-order zeros) or to the right for negative n 
(discarding digits shifted to the right of the “radix-6 
point”). 


Algorithm (FPP) 2 : 

Fixed-Precision Positive Floating-Point Printout 

Given: 


• Three radix-6 integers e, /, and p, representing the 
number v = fxb^‘~ p \ withp > 0 and 0 < / < b p , 


• A radix B. 


Output: Integers H and N and digits D k (H > k > N) 
such that if one defines the value 


= J2DixB' 


/ . V~ + V V + v + 

(*) — <“< — 

(b) H is the smallest integer (that is, furthest from oo) 
and N the largest integer (that is, furthest from 
-oo) such that (a) can be true 

■dN 

( C ) |f_„|<£_ 


(d) Each digit is output before the next is generated. 
Procedure: see Tables 3 and 4. I 

Unfortunately, this algorithm requires / 0; it does 

not work properly for zero. Notice that relationship 
(a) is not stated in the symmetric form \V — v\ < x. 
The relationship is not symmetrical because of the phe- 
nomenon of unequal gaps. 

As in algorithm (IP) 2 , all arithmetic operations pro- 
duce integer results and all variables take on integer 
values. This is done by using the scale factor S = 
shift b (l, — (e— p)) = [6( p-e )j. Note, however, that some 
of these integer values may be very large. If this algo- 
rithm is used to print (in decimal) floating-point num- 
bers in IEEE standard single-precision floating-point 
format [IEEE81, IEEE85], integers as large as 2 154 may 
be calculated; for double-precision format integers as 
large as 2 1050 may be encountered. Such a large integer 
can be represented in fewer than five 32-bit words (for 
single precision) or forty 32-bit words (for double pre- 
cision). While multi-precision integer arithmetic is re- 
quired in the general case the storage requirements are 
certainly not impractical. The multiprecision arithmetic 
does not have to be quite as general as that presented 
in [Knuth69, §4.3.1], One must add, subtract, and com- 
pare multiprecision integers; multiply and divide multi- 
precision integers by 5; and divide one multiprecision 
integer by another (but only in situations where it is 
known that the result will be a radix-5 digit, which is 
much simpler than the general case). The size of the in- 
tegers involved depends primarily on the exponent value 
of the floating-point number to be printed. For floating- 
point numbers of reasonable magnitude, 32-bit or 64-bit 
integer arithmetic is likely to suffice and so execution 
speed is typically not a problem. 

The successor and predecessor v~ to a positive 
floating-point number v-fx b^~ p> > are taken to be 


v + = v + b(‘~ p ) 


A 

= if / = b( p ~^ then v - b^‘- p ~ 1 '> else v - 6 (e_p 





Table 3: Fixed-Precision 

Floating-Point Printout ((FPP) 2 ) 

begin 

assent / ^ 0 

R «- shift b (f, max(e - p, 0)); 

S — shift b (l, max(0, -(e - p))); 
assert R/S = / x &( e-p ) = v 
M~ <— shift b (l, max(e — p, 0)); 

Af + <— Af - ; 

Simple-Fixup; 

H *-k - 1; 

loop 


assert (R/S) x B k + ^ Di x B' = v 


U^[(RxB)/S\; 

R <— (R x B) mod S; 

Af- *— Af- x B; 

Af + «— Af + x B; 
low «— 2 x B < Af - ; 

Aifffc «— 2xB>(2xS) — Af + ; 
while (not low) and (not high) : 

D k <- tf; 

repeat; 

H 

comment Let V fc = ^ Di x B*. 


Positive 


+ u 
2 


assert low => — - — < (U x B k + V k ) < v 

A 

assert high => v < (( U + 1) x B k + V fc ) < - — — 


low and not high : D k *— U; 
high and not low : D k *— U + 1; 
low and high : 


Table 4: Procedure Simple-Fixup 

procedure Simple-Fixup; 
begin 

assert R/S = v 

assert M~/S = M + /S — &( e “ p ) 
if / = ahift b (l,p - 1) then 
M + - shift b (M + , 1); 

R «- shift b (R, 1 ); 

S «- shift b (S , 1 ); 

fi; 

k <- 0; 
loop 

assert (B/S) x B k = v 
assert ( M~/S ) x B k = v — v~ 
assert (M + /S) x B k — v + — u 
while B < [S/B] : 
fc ♦— fc — 1; 

B * — B x B; 

Af - <— Af - x B; 

Af + «— Af + x B; 
repeat; 

assert fc = min(0, 1 + [log s vj) 
loop 

assert (R/S) x B k = v 
assert (M~/S) x B k = v — re- 
assert (M + /S) x B k — v + — v 
while (2 x B) + Af + > 2 x S : 

5 *— S x B; 

fc «— fc + 1; 

repeat; 

v -f- u + I 

assert k = 1 + log s 


2 x B < S : D k <-U; 

2 x B > S : I)* — U + 1; 
endcases; 
endcases; 

N *-k; 

end; 


1 


The formula for v + does not have to be conditional, 
even when the representation for u + requires a larger 
exponent e than v does in order to satisfy f < b p . The 
formula for however, must take into account the sit- 
uation where / = because v~ may then use the 

next-smaller'value for e, and so the gap size is smaller by 
a factor of b. There may be some question as to whether 
this is the correct criterion for floating-point numbers 
that are not normalized. Observe, however, that this 
is the correct criterion for IEEE standard floating-point 
format [IEEE85], because all such numbers are normal- 
ized except those with the smallest possible exponent, 
so if v is denormalized then v~ must also be denormal- 
ized and have the same value for the exponent e. 

To compensate for the phenomenon of unequal gaps, 
the variables M ~ and M + are given the value that M 
would have in Algorithm (IP) 2 , and then adjustments 
are made to M + , R, and 5 if necessary. The simplest 
way to account for unequal gaps would be to divide 
M~ by b. However, thisMnight cause M~ to have a 
non-integral value in some situations. To avoid this, 
we instead scale the entire algorithm by an additional 
factor of 6, by multiplying M + , R, and 5 by b. 

The presence of unequal gaps also induces an asym- 
metry in the termination test that reveals a previously 
hidden problem. In Algorithm (IP) 2 , with M~ = M + — 
M , it was the case that if R > (2 x S) — M + and 
2 x R < S, then necessarily R < M~ also. Here this 
is not necessarily so, because M~ may be smaller than 
M + by a factor of b. The interpretation of this is that 
in some situations one may have 2 x R < 5 but nev- 
ertheless must round up to the digit 17+1, because 
the termination test succeeds relative to M + but fails 
relative to M~ . 

This problem is solved by rewriting the termination 
test into two parts. The boolean variable low is true 
if the loop may be exited because M~ is satisfied (in 
which case the digit U may be used). The boolean vari- 
able high is true if the loop may be exited because M + 
is satisfied (in which case the digit U + 1 may be used). 
If both variables are true, then either digit of U and 
17+1 may be used, as far as information-preservation 
is concerned; in this case, and this case only, the com- 
parison of 2 x R to 5 should be done to satisfy the 
correct-rounding property. 

Algorithm (FPP) 2 is conceptually divided into four 
parts. First, the variables R, S, M~, and M + are ini- 
tialized. Second, if the gaps are unequal then the neces- 
sary adjustment is made. Third, the radix- B weight H 
of the first digit is determined by two loops. The first 
loop is needed for positive H, and repeatedly multiplies 
S by B; the second loop is needed for negative H, and 
repeatedly multiplies R, M~, and M + by B. (Divisions 
by B are of course avoided to prevent roundoff errors.) 


Fourth, the digits are generated by the last loop; this 
loop should be compared with the one in Table 1. 


8 Fixed-Format Floating-Point Output 


Algorithm (FPP) 2 simply generates digits, stopping 
only after producing the appropriate number of digits 
for precise fxee-format output. It needs more flexible 
means of cutting off digit generation for fixed-format 
applications. Moreover, it is desirable to intersperse 
formatting with digit generation rather than buffering 
digits and then re-traversing the buffer. 

It is not difficult to adapt Algorithm (FPP) 2 for fixed- 
format output, where the number of digits to be out- 
put is predetermined independently of the floating-point 
value to be printed. Using this algorithm avoids giving 
the illusion of more internal precision than is actually 
present, because it cuts off the conversion after suffi- 
ciently many digits have been produced; the remaining 
character positions are then padded with blanks or ze- 


As an example, suppose that a Fortran program 
is written to calculate x, and that it indeed calcu- 
lates a floating-point number that is the best pos- 
sible approximation to x for that floating-point for- 
mat, precise to, say, 27 bits. However, it then prints 
the value with format F20.18, producing the output 
“3 . 141592651605606079”. 

This is indeed accurately the value of that floating- 
point number to that many places, but the last ten 
digits are in some sense not justified, because the in- 
ternal representation is not that precise. Moreover, 
this output certainly does not represent the value of 
x accurately; in format F20.18, x should be printed 
as “3.141592653589793238”. The fxee-format print- 
ing procedure described above would cut off the conver- 
sion after nine digits, producing “3.14159265” and no 
more. The fixed-format procedure to be developed be- 
low would therefore produce “3 . 14159265 ” 

or “3.141592650000000000”. Either of these is much 
less misleading as to internal precision. 

The fixed-format algorithm works essentially uses the 
free-format output method, differing only in when con- 
version is cut off. If conversion is cut off by exhaustion 
of precision, then any remaining character positions are 
padded with blanks or zeros. If conversion is cut off by 
the format specification, the only problem is producing 
correctly rounded output. This is easily almost-solved 
by noting that the remainder R can correctly direct 
rounding at any digit position, not just the last. In 
terms of the programs, almost all that is necessary is to 
terminate a conversion loop prematurely if appropriate, 
and the following cases statement will properly round 
the last digit produced. 
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This doesn’t solve the entire problem, however, be- 
cause if conversion is cut off by the format specification 
then the early rounding may require propagation of car- 
ries; that is, the proof of property (d) fails to hold. This 
can be taken care of by adjusting the values of M ~ and 
M + appropriately. In principle this adjustment some- 
times requires the use of non-integral values that cannot 
be represented exactly in radix 6; in practice such values 
can be rounded to get the proper efFect. 

As noted above, it is indeed not difficult to adapt 
the simplified algorithm for a particular fixed format. 
However, straightforwardly adapting it to handle a va- 
riety of fixed formats is clumsy. We tried to do this, 
and found that we had to introduce many switches and 
flags to control the various aspects of formatting, such 
as total field width, number of digits before and after 
the decimal point, width of exponent field, scale factor 
(as for “P” format specifiers in Fortran), and so on. The 
result was a tangled mesa* very difficult to understand 
and maintain. 

We finally untangled the mess by completely separat- 
ing the generation of digits from the formatting of the 
digits. We added a few interface parameters to produce 
a digit generator (called Dragon4) to which a variety 
of formatting processes could be easily interfaced. The 
generator and the formatter execute as coroutines, and 
it is assumed that the “user” process executes as a third 
coroutine. 

For purposes of exposition we have borrowed cer- 
tain features of Hoare’s CSP (Communicating Sequen- 
tial Processes) notation [Hoare78], The statement 

GENERATE ! (xi , . . . , z n ) 

means that a message containing values *i, . . . , x n is to 
be sent to the GENERATE process; the process that exe- 
cutes such a statement then continues its own execution 
without waiting for a reply. The statement 

FORMAT ?(*!,... ,x„) 

means that a message containing values , . . . , x„ is 
to be received from the FORMAT process; the process 
that executes such a statement waits if necessary for a 
message to arrive. (The notation is symmetrical in that 
the sender and receiver of a message must each know 
the other’s name.) We do not borrow any of the control 
structure notations of CSP, and we do not worry about 
the matter of processes failing. 

To perform floating-point output one must effectively 
execute three coroutines together. In the CSP notation 
one writes: 

[ USER :: user process | 

FORMAT :: formatting process | 

GENERATE :: Dragon 4 ] 



The interface convention is that the “user process” must 
first send the message 

FORMAT! (6, e,f,p,B) , : J|p 

to initiate floating-point conversion and then may 
ceive characters from the FORMAT process until a “(B* 3 
character is received. The “®” character serves as a terSj 
minator and should be discarded. Any of the formatting 1 ’! 
processes shown below may be used; to get free-formal; f 
conversion, for example, one would execute xsl M 

[ USER :: user process | 7^11 

FORMAT :: Free-Formal | 77*^1 

GENERATE :: Dragon 4 ] 

and to get fixed-format exponential formatting one ; 
would execute 

r , 

[ USER :: user process | / 

FORMAT :: Fixed-Format Exponential | jj if' 

GENERATE :: Dragon 4 ] . 

We have found it easy to write new formatting routine 


9 Implementations 

fvshfii 

In 1981 we coded a version of Algorithm Dragon4 in 
Pascal, including the formatting routines shown here as 
well as formatters for Fortran E, F, and G formats and fo 
PL/I-style picture formats such as $$$ ,$$$,$$9.99C 
This suite has been tested thoroughly. This algorit!. 
has also been used in various Lisp implementations fo 
both free-format and fixed-format floating-point output. 
A portable and performance-tuned implementation in 
is in progress. ; : jS| 

10 Historical Note and Acknowledgments * '7 

.. . 

The work reported here was done in the late 1970 
This is the first published appearance, but earlier ver- 
sions of this paper have been circulated “unpublish 
through the grapevine for the last decade. The algi 
rithms have been used for years in a variety of langua; 
implementations, perhaps most notably in Zetalisp. 

Why wasn’t it published earlier? It just didn’t se 
like that big a deal; but we were wrong. Over the ye 
floating-point printers have continued to be inaccurat* 
and we kept getting requests for the unpublished draft 
We thank Donald Knuth for giving us that last requi 
push to submit it for publication. 

The paper has been almost completely revised for pb 
sentation at this conference. An intermediate version 
the algorithm, Dragon3 (Free-Format Perfect Positiv 
Floating-Point Printout), has been omitted from tH* 
presentation for reasons of space; it is of interest only 
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in being closest in form to what was actually first im- 
plemented in MacLisp. We have allowed a few anachro- 
nisms to remain in this presentation, such as the sugges- 
tion that a tiny version of the printing algorithm might 
be required for microcomputer implementations of BA- 
SIC. You will find that most of the references date back 
to the 1960’s and 1970’s. 

Helpful and valuable comments on drafts of this paper 
were provided by Jon Bentley, Barbara K. Steele, and 
Daniel Weinreb. We are also grateful to Donald Knuth 
and Will Clinger for their encouragement. 
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Science and the Artificial Intelligence Laboratory. From 
1978 to 1980, Steele was supported by a Fanny and John 
Hertz Fellowship. 

This work was carried forward by Steele at Carnegie- 
Mellon University, Tartan Laboratories, and Thinking 
Machines Corporation, and by White at IBM T. J. Wat- 
son Research and Lucid, Inc. We thank these institu- 
tions for their support. 

This work was supported in part by the Defense Ad- 
vanced Research Projects Agency, Department of De- 
fense, ARPA Order 3597, monitored by the Air Force 
Avionics Laboratory under contract F33615-78-C-1551. 
The views and conclusions contained in this document 
are those of the authors and should not be interpreted 
as representing the official policies, either expressed or 
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